Conversation
|
Hey guys, as I am switched to lorax and started contributing there a lot after the first license change I am happy to see the PR got opened I would be happy if you are open for some questions and discussion about this. I'd would be happy to contribute here too. |
|
hi @flozi00 thanks for the feedback! can you share more about the lorax style api? I see that in lorax you can specify the adapter via the |
|
Yes, i mean the "adapter_id" inside "parameters" for the tgi api (as you did it, i see now), and the "model" in the openai api :) |
|
update: This PR's implementation has been updated to align with the great work done by the lorax team. This implementation tries to use the same layers when possible and only diverges to work with TGI's recent updates/improvements and limits lora to loading at startup. Current changes allow weights to be loaded similar to Lorax, however there are still issues with generation to be resolved, and other refactors |
|
Looks like you are successfully adopting the lorax code |
|
@flozi00 generation with loras is mostly stable, just focusing on the rebase then refactors now. And thank you 🙂 a review once the PR is ready would be super helpful! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Thanks for the shoutout in the docs! It's quite interesting to see things come full circle, maybe we should chat about merging our projects. |
|
of course @tgaddair thank you for the awesome work! thats an interesting idea and we are always aiming to improve TGI. We appreciate any contributions/discussions about features that may be helpful to our users |
I'd love to migrate to tgi again 👍 And of course trying to contribute here too @tgaddair |
|
hi @xiadingZ in this PR lora adapters are loaded from the once this initial lora work is merged we'll follow up with other improvement such as easier ways to specify lora path, and etc |
Hi, @drbh I can try your methods with downloaded lora. But I have a lora adapter trained locally. It doesn't have a directory structure such as I set HUGGINGFACE_HUB_CACHE as |
server/text_generation_server/models/custom_modeling/flash_llama_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/flash_llama_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/flash_llama_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/flash_llama_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/flash_llama_modeling.py
Outdated
Show resolved
Hide resolved
|
Forgot to add: we probably want an integration test as well. |
danieldk
left a comment
There was a problem hiding this comment.
Thanks for all the changes! Looks ready to merge to me after the small nit that breaks CI is fixed.
|
@danieldk thanks for the review! I've fixed the nits and CI passes. Going to go ahead and merge based on your last approval |
* feat: first draft load multiple lora * feat: load weights within layer and refactor lora pass * fix: refactor and reduce lora math * feat: baseline impl single request multi lora support * feat: prefer lorax implementation and port loading logic * fix: prefer adapter_data and refactors * feat: perfer loraxs custom punica kernels and add mlp loras * fix: adjust batch for bgmv * fix: adjust adapter_segments logic when in batch * fix: refactor and move changes to v3 proto * fix: pass model_id for all flash causal lms * fix: pass model_id for all causal and seq2seq lms * fix: add model_id to model test * feat: add lora support to mistral and refactors * feat: prefer model id in request * fix: include rust code for adapter id * feat: bump launcher and add new lora docs * feat: support base model generation and refactors * fix: rename doc to retry ci build * feat: support if vlm models * fix: add adapter_data param and avoid missing layers * fix: add adapter_data param to phi and neox * fix: update all models forwards to include adapter_data * fix: add model_id to IdeficsCausalLM * Update lora.md Fixed a typo * Update lora.md Fixing spam image * fix: add lora kernel to dockerfile, support running without kernels and refactors * fix: avoid dockerfile conflict * fix: refactors and adjust flash llama lora logic * fix: skip llama test due to CI issue (temp) * fix: skip llama test CI (temp) 2 * fix: revert skips and prefer updated ci token for tests * fix: refactors and helpful comments * fix: add noop in TensorParallelAdapterRowLinear too * fix: refactor and move shard_lora_weights logic * fix: exit early if no adapter_data --------- Co-authored-by: Derek <datavistics@gmail.com>

This PR is a work in progress to add support for mutliple loras to be loaded at startup and then use 0 or 1 adapters in a request by specifying the adapter id.
Example usage
download adapter without auto merging
start server with multiple LoRa adapters
sending request without adapter id
curl 127.0.0.1:3000/generate \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "inputs": "What are 3 unique words that describe you?", "parameters": { "max_new_tokens": 40 } }'{ "generated_text": "\n\nI’m a very passionate person. I’m very driven. I’m very determined.\n\nWhat is your favorite thing about being a teacher?\n\nI love the fact" }with first LoRa adapter specified
curl 127.0.0.1:3000/generate \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "inputs": "What are 3 unique words that describe you?", "parameters": { "max_new_tokens": 40, "adapter_id": "predibase/customer_support" } }'{ "generated_text": "\n\nI’m not sure if I can come up with 3 unique words that describe me, but I’ll try.\n\n1. Creative\n2. Funny\n3." }with second LoRa adapter specified
curl 127.0.0.1:3000/generate \ -X POST \ -H 'Content-Type: application/json' \ -d '{ "inputs": "You are given the title and the body of an article below. Please determine the type of the article.### Title: Great White Whale\n\n### Body: Great White Whale is the debut album by the Canadian rock band Secret and Whisper. The album was in the works for about a year and was released on February 12 2008.", "parameters": { "max_new_tokens": 40, "adapter_id": "predibase/dbpedia" } }'{ "generated_text": "8" }